skip to main content


Search for: All records

Creators/Authors contains: "Jin, Di"

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

  1. While most network embedding techniques model the proximity between nodes in a network, recently there has been significant interest in structural embeddings that are based on node equivalences , a notion rooted in sociology: equivalences or positions are collections of nodes that have similar roles—i.e., similar functions, ties or interactions with nodes in other positions—irrespective of their distance or reachability in the network. Unlike the proximity-based methods that are rigorously evaluated in the literature, the evaluation of structural embeddings is less mature. It relies on small synthetic or real networks with labels that are not perfectly defined, and its connection to sociological equivalences has hitherto been vague and tenuous. With new node embedding methods being developed at a breakneck pace, proper evaluation, and systematic characterization of existing approaches will be essential to progress. To fill in this gap, we set out to understand what types of equivalences structural embeddings capture. We are the first to contribute rigorous intrinsic and extrinsic evaluation methodology for structural embeddings, along with carefully-designed, diverse datasets of varying sizes. We observe a number of different evaluation variables that can lead to different results (e.g., choice of similarity measure, classifier, and label definitions). We find that degree distributions within nodes’ local neighborhoods can lead to simple yet effective baselines in their own right and guide the future development of structural embedding. We hope that our findings can influence the design of further node embedding methods and also pave the way for more comprehensive and fair evaluation of structural embedding methods. 
    more » « less
  2. Multi-source entity linkage focuses on integrating knowledge from multiple sources by linking the records that represent the same real world entity. This is critical in high-impact applications such as data cleaning and user stitching. The state-of-the-art entity linkage pipelines mainly depend on supervised learning that requires abundant amounts of training data. However, collecting well-labeled training data becomes expensive when the data from many sources arrives incrementally over time. Moreover, the trained models can easily overfit to specific data sources, and thus fail to generalize to new sources due to significant differences in data and label distributions. To address these challenges, we present AdaMEL, a deep transfer learning framework that learns generic high-level knowledge to perform multi-source entity linkage. AdaMEL models the attribute importance that is used to match entities through an attribute-level self-attention mechanism, and leverages the massive unlabeled data from new data sources through domain adaptation to make it generic and data-source agnostic. In addition, AdaMEL is capable of incorporating an additional set of labeled data to more accurately integrate data sources with different attribute importance. Extensive experiments show that our framework achieves state-of-the-art results with 8.21% improvement on average over methods based on supervised learning. Besides, it is more stable in handling different sets of data sources in less runtime. 
    more » « less
  3. null (Ed.)
  4. Liane Lewin-Eytan, David Carmel (Ed.)
    Graph convolutional networks (GCNs), aiming to obtain node embeddings by integrating high-order neighborhood information through stacked graph convolution layers, have demonstrated great power in many network analysis tasks such as node classification and link prediction. However, a fundamental weakness of GCNs, that is, topological limitations, including over-smoothing and local homophily of topology, limits their ability to represent networks. Existing studies for solving these topological limitations typically focus only on the convolution of features on network topology, which inevitably relies heavily on network structures. Moreover, most networks are text-rich, so it is important to integrate not only document-level information, but also the local text information which is particularly significant while often ignored by the existing methods. To solve these limitations, we propose BiTe-GCN, a novel GCN architecture modeling via bidirectional convolution of topology and features on text-rich networks. Specifically, we first transform the original text-rich network into an augmented bi-typed heterogeneous network, capturing both the global document-level information and the local text-sequence information from texts.We then introduce discriminative convolution mechanisms, which performs convolution on this augmented bi-typed network, realizing the convolutions of topology and features altogether in the same system, and learning different contributions of these two parts (i.e., network part and text part), automatically for the given learning objectives. Extensive experiments on text-rich networks demonstrate that our new architecture outperforms the state-of-the-arts by a breakout improvement. Moreover, this architecture can also be applied to several e-commerce search scenes such as JD searching, and experiments on JD dataset show the superiority of the proposed architecture over the baseline methods. 
    more » « less
  5. Following the significant coastal changes caused by Hurricane Sandy in 2012, engineered berm-dunes were constructed along the New Jersey coastline to enhance protection from future storms. Following construction, property values on Long Beach Island, NJ, increased in three beachfront communities. The projects were financed entirely through federal disaster assistance, but the percentage of future maintenance costs must be covered by local communities. Whether communities are willing or capable of financially contributing to maintenance remains unclear because (i) some homeowners prefer ocean views over the protection afforded by the berm-dune structures, and (ii) stakeholder risk perceptions can change over time. To investigate the relationships between berm-dune geometries, values of coastal protection, and ocean view values, we developed a geo-economic model of the natural and anthropogenic processes that shape beach and dune morphology. The model results suggest that coastal communities may exhibit significant differences in their capabilities to maintain engineered dunes depending on stakeholder wealth and risk perception. In particular, communities with strong preferences for ocean views are less likely to maintain large-scale berm-dune structures over the long term. If these structures are abandoned, the vulnerability of the coast to future storms will increase. 
    more » « less
  6. null (Ed.)
  7. Abstract The ocean's twilight zone (TZ) is a vast, globe-spanning region of the ocean. Home to myriad fishes and invertebrates, mid-water fishes alone may constitute 10 times more biomass than all current ocean wild-caught fisheries combined. Life in the TZ supports ocean food webs and plays a critical role in carbon capture and sequestration. Yet the ecological roles that mesopelagic animals play in the ocean remain enigmatic. This knowledge gap has stymied efforts to determine the effects that extraction of mesopelagic biomass by industrial fisheries, or alterations due to climate shifts, may have on ecosystem services provided by the open ocean. We propose to develop a scalable, distributed observation network to provide sustained interrogation of the TZ in the northwest Atlantic. The network will leverage a “tool-chest” of emerging and enabling technologies including autonomous, unmanned surface and underwater vehicles and swarms of low-cost “smart” floats. Connectivity among in-water assets will allow rapid assimilation of data streams to inform adaptive sampling efforts. The TZ observation network will demonstrate a bold new step towards the goal of continuously observing vast regions of the deep ocean, significantly improving TZ biomass estimates and understanding of the TZ's role in supporting ocean food webs and sequestering carbon. 
    more » « less